When will Feature Feedback help? Quantifying the Complexity of Classification Problems

نویسندگان

  • Hema Raghavan
  • Omid Madani
  • Rosie Jones
چکیده

Supervised learning typically requires human effort to label a large number of training instances. Active learning strives to decrease the number of labeled training examples needed by actively engaging the learner and the human in an interactive process. Active learning has proven to be effective in many domains. With few training examples, past work has found that user prior knowledge on the importance of features, or interactive feature feedback, can guide the learner to converge faster, that is, with lower labeling costs. In this paper we aim to understand the kinds of problems for which such extra feedback are significantly beneficial. In other words, we ask what kind of problems can significantly benefit from interactive learning and whether for some problems the user has no choice but to engage in the tedious process of labeling many examples. Towards this goal, we define a set of four difficulty measures, 2 each of instance and feature complexity, for linear classification problems. These measures can efficiently be computed for real world problems for which linear classifiers are effective, such as text classification. We quantify the difficulty of 358 text classification problems and 9 corpora using our measures, illustrating the spectrum of problems that exist in text classification in addition to quantifying results that have only been qualitatively discussed in the text classification literature. We verify the intimate relationship (a high positive correlation) between feature complexity and instance complexity using our measures. We then use these measures to understand when feature feedback is likely to be very useful. We observe that many problems in the commonly used data sets are of low to medium complexity, that is, only roughly 10s of well selected features are required to gain most of the maximum attained performance on such concepts. We find that learning these kinds of problems especially stands to benefit from feature feedback. We note that our empirical difficulty measures and the rankings of problems and domains are of independent interest, beyond the active learning setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of quasi-linear piecewise selected models simultaneously with designing of bump-less optimal robust controller for nonlinear vibration control of composite plates

The idea of using quasi-linear piecewise models has been established on the decomposition of complicated nonlinear systems, simultaneously designing with local controllers. Since the proper performance and the final system close loop stability are vital in multi-model controllers designing, the main problem in multi-model controllers is the number of the local models and their position not payi...

متن کامل

Grading, no longer an obstacle to learners’ attendance to teacher feedback

Learners are often reported not to be motivated enough to attend to teacher feedback. Teachers also  tend  to  grade  learners’  writing  samples  when  providing  them  with  corrective  feedback though  they  know  it  may  divert  their  attention  away  from  teacher  feedback.  However,  not grading learner writings does not seem to be an option due to both learners’ demands for it and ins...

متن کامل

On the Empirical Complexity of Text Classification Problems

In order to train a classifier that generalizes well, different learning problems, in particular high-dimensional ones such as text classification, can require widely different amounts of training, as measured in terms of the number of training instances required to reach adequate accuracy or the number of features effectively utilized in the classifier. We define several measures of learning d...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

W4: The Neuro-feedback and Treatment of Attention Problems and Anxiety

There are the different treatments, such as: drugs, psychotherapy, cognitive therapy and behavior therapy, for management of anxiety. But nowadays, an intervention called ”Neuro-feedback” which is combination of electronic, behavior, neurology and pharmacology sciences has been innovated in which the neurons can be growth and reinforced and the brain’s function will be increased. In this interv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007